11 research outputs found

    Extreme Parkour with Legged Robots

    Full text link
    Humans can perform parkour by traversing obstacles in a highly dynamic fashion requiring precise eye-muscle coordination and movement. Getting robots to do the same task requires overcoming similar challenges. Classically, this is done by independently engineering perception, actuation, and control systems to very low tolerances. This restricts them to tightly controlled settings such as a predetermined obstacle course in labs. In contrast, humans are able to learn parkour through practice without significantly changing their underlying biology. In this paper, we take a similar approach to developing robot parkour on a small low-cost robot with imprecise actuation and a single front-facing depth camera for perception which is low-frequency, jittery, and prone to artifacts. We show how a single neural net policy operating directly from a camera image, trained in simulation with large-scale RL, can overcome imprecise sensing and actuation to output highly precise control behavior end-to-end. We show our robot can perform a high jump on obstacles 2x its height, long jump across gaps 2x its length, do a handstand and run across tilted ramps, and generalize to novel obstacle courses with different physical properties. Parkour videos at https://extreme-parkour.github.io/Comment: Website and videos at https://extreme-parkour.github.io

    A quantum-inspired tensor network method for constrained combinatorial optimization problems

    Full text link
    Combinatorial optimization is of general interest for both theoretical study and real-world applications. Fast-developing quantum algorithms provide a different perspective on solving combinatorial optimization problems. In this paper, we propose a quantum inspired algorithm for general locally constrained combinatorial optimization problems by encoding the constraints directly into a tensor network state. The optimal solution can be efficiently solved by borrowing the imaginary time evolution from a quantum many-body system. We demonstrate our algorithm with the open-pit mining problem numerically. Our computational results show the effectiveness of this construction and potential applications in further studies for general combinatorial optimization problems

    A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

    Full text link
    Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios. However, existing approaches (1) focus on identifying implicit correlations between utterances and one-hot encoded labels in both tasks while ignoring explicit label characteristics; (2) directly incorporate multi-intent information for each token, which could lead to incorrect slot prediction due to the introduction of irrelevant intent. In this paper, we propose a framework termed DGIF, which first leverages the semantic information of labels to give the model additional signals and enriched priors. Then, a multi-grain interactive graph is constructed to model correlations between intents and slots. Specifically, we propose a novel approach to construct the interactive graph based on the injection of label semantics, which can automatically update the graph to better alleviate error propagation. Experimental results show that our framework significantly outperforms existing approaches, obtaining a relative improvement of 13.7% over the previous best model on the MixATIS dataset in overall accuracy.Comment: Submitted to ICASSP 202

    G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

    Full text link
    The recent video grounding works attempt to introduce vanilla contrastive learning into video grounding. However, we claim that this naive solution is suboptimal. Contrastive learning requires two key properties: (1) \emph{alignment} of features of similar samples, and (2) \emph{uniformity} of the induced distribution of the normalized features on the hypersphere. Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations. Both characteristics lead to vanilla contrastive learning being unsuitable for video grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. We quantify the correlations among moments leveraging the geodesic distance that guides the model to learn the correct cross-modal representations. Furthermore, from the novel perspective of game theory, we propose semantic Shapley interaction based on geodesic distance sampling to learn fine-grained semantic alignment in similar moments. Experiments on three benchmarks demonstrate the effectiveness of our method.Comment: ICCV202

    Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

    Full text link
    Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists. However, simultaneously establishing global correspondences between the image (e.g., Chest X-ray) and its related report and local alignments between image patches and keywords remains challenging. To this end, we propose an Unify, Align and then Refine (UAR) approach to learn multi-level cross-modal alignments and introduce three novel modules: Latent Space Unifier (LSU), Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR). Specifically, LSU unifies multimodal data into discrete tokens, making it flexible to learn common knowledge among modalities with a shared network. The modality-agnostic CRA learns discriminative features via a set of orthonormal basis and a dual-gate mechanism first and then globally aligns visual and textual representations under a triplet contrastive loss. TIR boosts token-level local alignment via calibrating text-to-image attention with a learnable mask. Additionally, we design a two-stage training procedure to make UAR gradually grasp cross-modal alignments at different levels, which imitates radiologists' workflow: writing sentence by sentence first and then checking word by word. Extensive experiments and analyses on IU-Xray and MIMIC-CXR benchmark datasets demonstrate the superiority of our UAR against varied state-of-the-art methods.Comment: 8 pages,6 figures,4 table

    Exploiting Prompt Caption for Video Grounding

    Full text link
    Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video. Previous works ignore the \emph{sparsity dilemma} in video annotations, which fails to provide the context information between potential events and query sentences in the dataset. In this paper, we contend that exploiting easily available captions which describe general actions \ie, prompt captions (PC) defined in our paper, will significantly boost the performance. To this end, we propose a Prompt Caption Network (PCNet) for video grounding. Specifically, we first introduce dense video captioning to generate dense captions and then obtain prompt captions by Non-Prompt Caption Suppression (NPCS). To capture the potential information in prompt captions, we propose Caption Guided Attention (CGA) project the semantic relations between prompt captions and query sentences into temporal space and fuse them into visual representations. Considering the gap between prompt captions and ground truth, we propose Asymmetric Cross-modal Contrastive Learning (ACCL) for constructing more negative pairs to maximize cross-modal mutual information. Without bells and whistles, extensive experiments on three public datasets (\ie, ActivityNet Captions, TACoS and ActivityNet-CG) demonstrate that our method significantly outperforms state-of-the-art methods

    Vitexin attenuates smoke inhalation induced acute lung injury in rats by inhibiting oxidative stress via PKC β/p66Shc signaling pathway

    Get PDF
    Purpose: To investigate the protective effect of vitexin on smoke inhalation-induced acute lung injury (SI-ALI), and the underlying mechanism of action.Methods: The ALI rat model was established by inhalation of smoke in a closed smoke chamber. Survival rate, arterial blood gas analysis, wet-to-dry weight ratio of lung tissues, bronchoalveolar lavage fluid protein concentration, lung tissue histology, and oxidative stress and inflammation level were evaluated. Expressions of protein kinase C β (PKC β), p66Shc, and phosphorylated p66Shc were determined by western blot or quantitative reverse transcription-polymerase chain reaction.Results: Compared with smoke inhalation group, vitexin alleviated the decline in arterial partial pressure of oxygen (p < 0.05), reduced lung tissue exudation and pathological lung tissue damage, inhibited the expression of PKC β/p66Shc signaling pathway proteins, downregulated the level of oxidative stress and inflammation, and ultimately improved the survival rate in SI-ALI rats (p < 0.05).Conclusion: Vitexin attenuates SI-ALI in rats by alleviating oxidative stress via inhibition of PKC β/p66Shc signaling pathway. Thus, this compound is a potential agent for the treatment of SI-ALI

    Metagenomic sequencing for identifying pathogen-specific circulating DNAs and development of diagnostic methods for schistosomiasis

    No full text
    Summary: Timely diagnosis of Schistosoma infection, particularly in the early stage is crucial for identifying infected hosts and then taking effective control strategies. Here, metagenomic next-generation sequencing was used to identify pathogen-specific circulating DNAs (cDNAs) in the sera/plasma of New Zealand rabbits infected with S. japonicum, and the identified cDNAs were validated by PCR and qPCR. Loop-mediated isothermal amplification (LAMP)-based CRISPR-Cas12a and recombinase polymerase amplification-based lateral flow strip (RPA-LF) methods combined with the newly identified cDNA were developed to evaluate the potentials for diagnosing murine and human schistosomiasis. The results indicated that twenty-two cDNAs were identified. The developed LAMP-based CRISPR/Cas12a and RPA-LF methods showed a good potential for diagnosing murine or human schistosomiasis as early as 5 days of post-infection with 5 cercariae infection. In a word, S. japonicum specific cDNAs in circulation of infected hosts could be effective biomarkers for detecting Schistosoma infection particularly for early stages
    corecore